Goto

Collaborating Authors

 unsupervised curriculum


Unsupervised Curricula for Visual Meta-Reinforcement Learning

Neural Information Processing Systems

In principle, meta-reinforcement learning algorithms leverage experience across many tasks to learn fast and effective reinforcement learning (RL) strategies. However, current meta-RL approaches rely on manually-defined distributions of training tasks, and hand-crafting these task distributions can be challenging and time-consuming. Can ``useful'' pre-training tasks be discovered in an unsupervised manner? We develop an unsupervised algorithm for inducing an adaptive meta-training task distribution, i.e. an automatic curriculum, by modeling unsupervised interaction in a visual environment. The task distribution is scaffolded by a parametric density model of the meta-learner's trajectory distribution. We formulate unsupervised meta-RL as information maximization between a latent task variable and the meta-learner's data distribution, and describe a practical instantiation which alternates between integration of recent experience into the task distribution and meta-learning of the updated tasks. Repeating this procedure leads to iterative reorganization such that the curriculum adapts as the meta-learner's data distribution shifts. Moreover, we show how discriminative clustering frameworks for visual representations can support trajectory-level task acquisition and exploration in domains with pixel observations, avoiding the pitfalls of alternatives. In experiments on vision-based navigation and manipulation domains, we show that the algorithm allows for unsupervised meta-learning that both transfers to downstream tasks specified by hand-crafted reward functions and serves as pre-training for more efficient meta-learning of test task distributions.


Reviews: Unsupervised Curricula for Visual Meta-Reinforcement Learning

Neural Information Processing Systems

This paper presents a method for learning a distribution of tasks to feed to an agent that's learning via meta RL, while simultaneously optimizing the agent to perform better more quickly on tasks sampled from this distribution. The task distribution is trained using an objective that maximizes mutual information between a latent task variable and the trajectories produced by the meta RL agent. The meta RL agent is trained to maximize this mutual information, more or less. The overall optimization relies on some variational lower bounds on mutual information, and on the RL 2 algorithm for meta RL. Experiments are provided which show that the task distributions and meta RL agents trained in this co-adaptive manner exhibit some potentially useful behaviors, e.g. an improved ability to quickly solve new tasks sampled from an "actual" task distribution -- i.e., a task distribution which is not equal to the one that's co-adapted with the agent.


Reviews: Unsupervised Curricula for Visual Meta-Reinforcement Learning

Neural Information Processing Systems

This work makes progress in the unsupervised meta-learning domain with visual features. This work contributes a model for how to automatically learn useful data in an unsupervised sense, and to incorporate that into a meta learner. All three reviewers find the work novel and significant, and hence I recommend acceptance.


Unsupervised Curricula for Visual Meta-Reinforcement Learning

Neural Information Processing Systems

In principle, meta-reinforcement learning algorithms leverage experience across many tasks to learn fast and effective reinforcement learning (RL) strategies. However, current meta-RL approaches rely on manually-defined distributions of training tasks, and hand-crafting these task distributions can be challenging and time-consuming. Can useful'' pre-training tasks be discovered in an unsupervised manner? We develop an unsupervised algorithm for inducing an adaptive meta-training task distribution, i.e. an automatic curriculum, by modeling unsupervised interaction in a visual environment. The task distribution is scaffolded by a parametric density model of the meta-learner's trajectory distribution.


Solving (Some) Formal Math Olympiad Problems

#artificialintelligence

We built a neural theorem prover for Lean that learned to solve a variety of challenging high-school olympiad problems, including problems from the AMC12 and AIME competitions, as well as two problems adapted from the IMO.[1] The prover uses a language model to find proofs of formal statements. Each time we find a new proof, we use it as new training data, which improves the neural network and enables it to iteratively find solutions to harder and harder statements. We achieved a new state-of-the-art (41.2% vs 29.3%) on the miniF2F benchmark, a challenging collection of high-school olympiad problems. Our approach, which we call statement curriculum learning, consists of manually collecting a set of statements of varying difficulty levels (without proof) where the hardest statements are similar to the benchmark we target.


Unsupervised Curricula for Visual Meta-Reinforcement Learning

Neural Information Processing Systems

In principle, meta-reinforcement learning algorithms leverage experience across many tasks to learn fast and effective reinforcement learning (RL) strategies. However, current meta-RL approaches rely on manually-defined distributions of training tasks, and hand-crafting these task distributions can be challenging and time-consuming. Can useful'' pre-training tasks be discovered in an unsupervised manner? We develop an unsupervised algorithm for inducing an adaptive meta-training task distribution, i.e. an automatic curriculum, by modeling unsupervised interaction in a visual environment. The task distribution is scaffolded by a parametric density model of the meta-learner's trajectory distribution.